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The data outsourcing services provided by cloud storage have greatly reduced the headache of data management for users, but the 
issue of remote data integrity poses further security concerns and computing burdens. The introduction of a third-party auditor 
(TPA) frees data owners from the auditing burden and alleviates disputes over the audit results between data owners and cloud 
storage providers. However, malicious cloud servers may collude with TPAs to deceive users for financial profits. Hiring multiple 
auditors in a single audit assignment appears to be a method to address the above problem, but the ensuing voting issues need to be 
further explored. In this paper, we proposed a smart contract-based outsourced data integrity auditing scheme for multiauditor 
scenarios. Unlike some existing schemes using reputation like factors as their voting weights, auditors in our scheme vote equally 
and audit as they go, without any maintenance. This mechanism not only frees auditors from trivia not related to the auditing but 
also avoids the drawbacks of centralization associated with over-high voting weights. The challenge used to check the integrity of 
the outsourced data is jointly generated by each involved auditor. Any collusion would be detected as long as there exists more 
than one honest auditor in the audit. We implement and deploy the scheme as Ethereum smart contracts. With the help of 
blockchain, the entire auditing process is public and transparent. Both the generated data and the obtained results are persisted 
with immutability, which ensures the traceability of all historical audits. The comprehensive theoretical and experimental analyses 


demonstrate that our scheme meets the claimed targets with high efficiency and low gas costs. 


1. Introduction 


With the rapid development of information age, individuals 
and organizations have produced a large amount of data. By 
2025, the amount of data generated globally is expected to 
reach 463 exabytes each day [1]. Traditional local storage 
models can no longer meet the management needs of such a 
massive volume. Cloud storage is quickly attracting the 
attention of users for its scalability, low cost, and location 
free [2]. With technologies such as virtualization, cloud 
storage converges loose nodes into a powerful platform to 
provide unified services to users. Today, more and more 
people are willing to migrate their local data to leased cloud 
storage [3]. However, once the data is uploaded to the cloud 
storage, the owner completely loses control over the data. 


They are obliged to access the data through the interface 
provided by cloud storage servers, and they have to entirely 
rely on the cloud storage to ensure the integrity of their data. 
Unfortunately, even though cloud storage employs a variety 
of advanced technologies to guarantee the reliability and 
robustness of users’ data, corruption caused by hardware 
failure, management errors, or external attacks still occurs 
[4]. What is worse, malicious servers may even delete the 
data that is rarely accessed by users in order to free up more 
storage space to gain greater profit. In addition, once data 
integrity has been compromised intentionally and otherwise, 
dishonest storage servers tend to conceal the incidents to 
prevent their reputation from tarnishing. So how to effec- 
tively detect the integrity of data stored in the cloud storage 
has become a research hotspot. 


In order to address this problem, several remote data in- 
tegrity auditing schemes have been proposed [5-12]. These 
schemes enable users to efficiently audit their data’s integrity 
without a complete download. To achieve this, a user needs to 
divide the original file into blocks and then generate a tag for 
each block, which is used to verify the integrity of its corre- 
sponding block. When launching a file audit, a challenge will be 
generated and then sent to the storage server. The challenge 
contains a collection of selected block indexes and a collection 
of random numbers corresponding to the indexes. On re- 
ceiving the challenge, the cloud server picks the data blocks 
specified in the challenge and computes them together with the 
random numbers to obtain an integrity proof. By verifying the 
proof, the data owner can determine whether the cloud server 
is actually keeping his data virgin or not. 

To get rid of tedious audit routines and complex calcu- 
lations, data owners would like to delegate TPAs to conduct 
audit tasks. However, introducing a third party poses additional 
risks that malicious cloud servers may try to trick their users by 
colluding with auditors. Employing multiple auditors on an 
audit assignment and determining the final audit outcome 
based on the votes of all participants can mitigate this collusion, 
but how to design a reasonable voting mechanism with 
multiple untrusted participants is still challenging. 

The common method to deal with inconsistent voting 
results in a multiparticipant scenario is weighted voting, where a 
weight is supposed to be maintained for each auditor, which is 
typically represented by reputation. The weight of an auditor 
stands for the extent to which his vote influences the final result. 
In addition, when an auditor’s vote is consistent with the final 
result, his reputation increases, otherwise it decreases. Intui- 
tively, weighted voting hopes to build a virtuous ecosystem 
where honest auditors will always tell the truth and their 
reputation go on rising. On the contrary, dishonest auditors 
who are caught cheating will receive a reduction in reputation as 
punishment, and their reputation will keep declining as the 
cheating continues. At this rate, a few reputable auditors in the 
system are bound to become “elders,” and their excessive voice 
will gradually centralize the system. In contrast to weighted 
voting, the result of nonweighted voting depends only on the 
number of votes each candidate receives, and the only thing that 
needs to be considered is membership, namely, who can vote 
and who is not allowed to vote in the system. If there is no 
threshold for the voting, malicious attackers can easily generate 
a large number of accounts with a very low cost to be involved in 
an audit, directly affecting the final result by an overwhelming 
numerical advantage. This type of attack is known as the Sybil 
attack [13], which is common on peer-to-peer networks. 

In summary, the introduction of multiple auditors may 
somewhat mitigate the collusion, but the problem has not 
been fundamentally solved and the following threats remain. 


(1) In weighted voting, the mechanism would lead the 
system to be progressively centralized. The collusion 
of a few reputable auditors is enough to sway the final 
outcome, even if honest auditors are outnumbered. 
This will reduce the cost of malicious cloud servers 
doing evil, while also weakening user confidence in 
the auditing system. 
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(2) In nonweighted voting, without a reasonable 
membership for the system, Sybil attacks can be 
easily launched. Malicious cloud servers can generate 
or buy large numbers of audit accounts to promote 
their desired results. 


(3) The collusion between malicious cloud servers and 
auditors makes the detection of corrupted data fail. 
This collusion is undetectable because there is no way 
to distinguish whether an auditor’s challenge is 
randomly generated or well constructed. With this, 
the cloud server can “truly” pass the proof verifi- 
cation by saving only a small part of specific data 
blocks. 


1.1. Motivation. As mentioned above, in contrast to 
weighted voting, which inevitably leads to centralization, 
nonweighted voting only needs to design a reasonable 
membership mechanism to avoid Sybil attacks. Besides, the 
whole auditing process is considered to be written in the 
form of smart contracts and deployed to Ethereum, where 
any externally owned account (also known as a user account) 
can participate by simply paying a deposit. Another benefit 
of using smart contracts as the carrier for the multiauditor 
scenario is that it makes the process public, transparent, and 
traceable. The participants, details of the execution process, 
intermediate data, and the final result of the audit assign- 
ment are permanently recorded on the blockchain. You can 
always look up any historical audit without worrying about 
loss or manipulation. 


1.2. Contribution. Based on the above motivation, we design 
a remote data integrity audit scheme based on Ethereum 
smart contracts with the following features: 


(i) Cheating resistance. Without complete retention of 
user data, any server spoofing cannot pass the data 
integrity audit. This is the basic security require- 
ment for remote data integrity auditing. 


(ii) Smart contract-based audit. The auditing process is 
scheduled by smart contracts. Any Ethereum ex- 
ternally owned account can participate in the audit 
and has nothing to maintain, namely, audit as you 
go. Every single audit instance is persisted on the 
blockchain, which ensures public transparency and 
traceability. 


(iii) Collusion resistance. We propose an aggregated 
challenge generation algorithm, where the final 
challenge is composed of the share independently 
submitted by each auditor. Such that, as long as 
there exists at least one honest auditor, the challenge 
is not going to be generated as malicious auditors 
might expect. We also designed a nonweighted 
voting mechanism, namely “one person, one vote.” 
When the audit results come out inconsistent, the 
arbitration will be enforced and the honest will be 
rewarded and the dishonest punished. 
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1.3. Related Works. Traditional remote data integrity veri- 
fication mechanisms fall into two main types: one is provable 
data possession (PDP) and the other is proof of retrievability 
(PoR). In PDP, the user requests a proof by sending some 
randomly selected blocks to the server and then determines 
the integrity of the remote data by verifying that the proof is 
correct. The PoR scheme stores each encrypted file in a cloud 
server with a set of pseudorandom blocks. The client can 
then check the integrity of the data by verifying that the 
server retains the pseudorandom blocks. In 2007, Ateniese 
et al. [5] first defined PDP and proposed the PDP scheme. In 
their scheme, the data user randomly selects several blocks of 
data to verify the integrity of the data with less communi- 
cation and computational cost. If the integrity verification of 
these selected blocks passes, it can be determined that the 
server has a high probability of having complete data. Later, 
Juels et al. [14] proposed a PoR model in which the main idea 
is to embed a set of random values called “sentinels,” and the 
auditor can check the integrity of the data by checking the 
presence of sentinels at specific data points. Shacham and 
Waters proposed two PoR schemes based on the homo- 
morphic linear verifier [15], which further improved the 
efficiency of the PoR scheme proposed by Juels and Burton 
[14]. To implement PDP on dynamic cloud data, Ateniese 
et al. proposed another PDP scheme [16], which supports all 
dynamic operations except insertion operations. Shen et al. 
[6] proposed a dynamic PDP scheme that supports fully 
dynamic operations. Later, various PDP and PoR schemes 
were proposed to extend the performance or functionality of 
traditional schemes. A number of common PoR and PDP 
schemes have emerged to enrich the integrity checking 
capabilities of outsourced data, such as deduplication [17], 
batch audit [18, 19], and data update [7, 20]. To reduce the 
computational burden on the user side, public auditing 
schemes [8, 10-12, 21-23] are proposed to allow TPAs to 
audit the integrity of their cloud data on behalf of data 
owners. To guarantee the integrity of medical data and 
reduce the burden of the data owner, Li et al. [24] propose an 
efficient, privacy-preserving public auditing protocol for 
cloud-based medical storage systems that supports the 
functions of batch auditing and dynamic update of data. This 
scheme not only saves TPA and data owner computation 
costs but also reduces the communication overhead between 
TPA and cloud servers. Considering that key retention is a 
burden for data users, Shen et al. [25] propose a new par- 
adigm called “data integrity auditing without private key 
storage,” which utilizes a linear sketch with coding and error 
correction processes to confirm the identity of the user. To 
enable data integrity auditing under the multiwriter model, 
He et al. [26] propose the first public auditing scheme for 
shared data that supports fully dynamic operations. To 
implement the new paradigm, they proposed a specially 
designed authenticated structure, called the blockless Merkle 
tree, and a novel cryptographic primitive, called permission- 
based signature in edge computing scenarios, caching data 
on edge servers can minimize users’ data retrieval latency. 
However, this new architecture poses challenges for tradi- 
tional data audit models. Li et al. [27] propose a new data 
structure named variable Merkle hash tree (VMHT) for 


generating the integrity proofs of those data replicas during 
the audit, which solves the above problem. Considering 
existing schemes suffer from issues of complex certificate 
management or key escrow problems, Gudeme et al. [28] 
propose a certificateless privacy-preserving public auditing 
scheme for dynamic shared data with group user revocation 
in cloud storage, without public key infrastructure (PKI) or 
identity-based cryptography (IBC). To verify whether an 
untrusted CSP stores all their replicas in different geographic 
locations or not. Yu et al. [29] propose a dynamic multi- 
replica auditing scheme, with both the integrity and geo- 
graphic locations of a cloud user’s data replicas verified. 

Recently, blockchain has been considered as one of the 
most promising technologies to provide security support 
for IoT systems [30]. It was initially used to provide digital 
payments [31] and is now commonly used for smart 
contracts [32, 33] and data storage. The trust issues as- 
sociated with traditional data integrity verification make 
the integration of blockchain into data integrity verifica- 
tion an inevitable trend. Based on a distributed data 
storage blockchain, Zhang et al. [34] proposed a privacy- 
preserving electronic health record (EHR) public auditing 
scheme to prevent malicious behavior by TPA. However, it 
does not support batch auditing and data updates. Liu et al. 
[35] proposed to apply blockchain to avoid the use of TPA, 
and Yue et al. [36] proposed a blockchain-based frame- 
work that attempts to obtain trustworthy audit results. 
They all lack the necessary considerations to ensure the 
credibility of the results of off-chain events. Kun et al. [37] 
implemented private blockchain-based data validation in 
an untrustworthy environment, but their solution requires 
building and deploying a private blockchain, which is very 
difficult in practice. Zhou et al. [38] proposed a witnessing 
model to credibly enforce smart contract-based off-chain 
cloud service level agreements (SLA). Miao et al. [39] 
proposed a mechanism to generate challenges using block 
hashes, but the method does not guarantee that the audit 
results will not be tampered with off-chain. There are also 
some blockchain-based multiaudit models [37, 40]. 
However, their proof validation process is in smart con- 
tracts or in blockchains using proof of work, which can 
consume excessive costs of public chains or validation 
time. Zhang et al. [41] propose a certificateless public 
verification scheme against procrastinating auditors 
(CPVPA) by using blockchain technology. CPVPA is built 
on certificateless cryptography and is free from the cer- 
tificate management problem. This scheme mitigates the 
impact of the TPA’s laziness on the audit. To solve the 
problem of repeated auditing of data shared by multiple 
tenants, Xu et al. [42] propose a blockchain-based dedu- 
plicatable data auditing mechanism, which also works out 
the problems such as high cost and reliance on trusted 
third parties in traditional approaches. Chen et al. [33] 
proposed a blockchain-based crowdsourcing auditing 
approach to achieve trustworthiness in audit results. The 
model relies on an untrusted audit committee. However, 
the scheme maintains a reputation as the voting weight for 
each auditor, which may introduce the disadvantage of 
centralization to integrity auditing. 


1.4. Organization. The rest of the paper is organized as 
follows. We discuss the preliminaries in Section 2. Section 3 
describes the subalgorithms executed by each participant 
and the scheduling framework of the scheme. The security 
analysis and formal proof are described in Section 4. Section 
5 analyses the implementation and performance. Finally, 
Section 6 concludes the paper. 


2. Preliminaries 


2.1. Bilinear Map. Let G and Gy be two multiplicative cyclic 
groups with a large prime order q. e: Gx G — Gr is a 
bilinear map with the following properties: 


(i) Bilinearity. Wu,veG and Va,be Z*, it has 
e(u*, v?) = e(u,v)”. 

(ii) Non-degeneracy. du, v € G where u, v are generators 
of G, it has e(u,v) # 1g. 


(iii) Computability. Yu, v € G, there exists an efficient 
algorithm to calculate e(u, v). 


2.2. Complexity Assumption 


Definition 1. (Computational Diffie-Hellman (CDH) 
problem). Suppose G is a multiplicative cyclic group. g is a 
generator of G. Given the tuple (g, g*, g?) with the unknown 
elements a, b € Zp the CDH problem is to calculate g”. 


Definition 2. (CDH assumption). The advantage for any 
probabilistic polynomial time (PPT) algorithm & to solve 
the CDH problem in G, is negligible. It is defined as 
Adve y = Pri (g, 9%, g”) — g”: a,bepZ*]<e. Here, e 
denotes a negligible value. 


Definition 3. (Discrete logarithm (DL) problem). Given the 
tuple (g, g°) where a € Z} is unknown. the DL problem is to 
calculate a. 


Definition 4. (DL assumption). The advantage for any PPT 
algorithm & to solve the DL problem in G, is negligible. It is 
defined as Adve y = Pr[d (gg) — a: aERZ;] <e. Here, 
e denotes a negligible value. 


2.3. Blockchain and Smart Contract. Blockchain technology 
enables decentralized peer-to-peer transactions, coordina- 
tion, and collaboration without trust through data en- 
cryption, timestamps, and distributed consensus. A “smart 
contract” is simply a program that runs on the blockchain. It 
is a collection of codes (its functions) and data (its state) that 
resides at a specific address on the blockchain. They are 
typically used to automate the execution of an agreement so 
that all participants can be immediately certain of the 
outcome, without an intermediary’s involvement or time 
loss. User accounts can interact with a smart contract by 
submitting transactions that execute a function defined in 
the smart contract. Smart contracts cannot be deleted by 
default, and interactions with them are irreversible. With the 
help of blockchain’s immutability, the process of running 
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smart contracts and generating data cannot be changed later. 
This is very important when you want to trust something or 
make something more trustable. The scheduling part of the 
audit assignments can be stripped out of the overall audit 
logic and put into a smart contract. The parties participate in 
the audit by interacting with the contract. The contract is 
responsible for driving the audit process, collecting the 
intermediate results of each participant’s calculations, 
assigning calculation tasks to each participant, completing 
the vote tally, and outputting the final audit results. 


2.4. Two-Phase Commit. The concept of two-phase commit 
(2PC) is derived from the database management system. It is 
a standardized protocol that ensures that a database commit 
is implemented in the situation where a commit operation 
must be broken into two separate parts. Since our audit 
scheme is based on smart contracts, any data submitted by 
the participants is publicly available. This poses a security 
risk to the operation of the protocol. The purpose of in- 
troducing 2PC in a public system is to ensure that the data 
submitted by each participant is confidential to others. 


3. Proposed Scheme 


In this section, we introduce the components of the pro- 
posed system and then explain the subalgorithms related to 
data integrity auditing and their executors. 


3.1. System Model. The system consists of a data owner, a 
storage provider, an auditor, and a smart contract, where 
there can be any number of auditors. 


(i) Data Owner. The data owner rent cloud storage 
services and outsource large amounts of data to the 
cloud storage. The data owner may be individual or 
organizational consumers. 


(ii) Storage Provider. The storage provider provides 
cloud storage services to the data owner. It has 
significant storage capacity and powerful comput- 
ing capability. When receiving a data auditing 
challenge, the storage provider should respond with 
an integrity proof to auditors. 


(iii) Auditor. The auditor challenges the storage provider 
and identifies the integrity of the user data by 
verifying the proof returned by the provider. 


(iv) Smart Contract. The smart contract stipulates the 
audit process. There are two smart contracts in the 
system. While AMSC (assignment management 
smart contract) manages the audit assignments, 
AASC (audit assignment smart contract) is in- 
stantiated by AMSC and performs a specific 
auditing assignment. 


3.2. Notations. To make the proposed scheme more clearly 
understood, we summarize the main notations involved in 
Table 1. 
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TABLE 1: Notations. 


Notation Meaning 

À A security parameter 

G and Gr Two cyclic multiplicative groups 

g The generator of G 

e A bilinear map 

H and h Two different hash functions 

O) A pseudorandom function 

1 A pseudorandom permutation 

u A random value 

x The secret key 

v The public key calculated as g* 

F A specific file to be outsourced 

Fig An identity assigned to the above file F 

m; One of the blocks constituting the file F 

0; One of the authenticators corresponding to m; 

(0) The set of o; 

t The file tag of F 

c The number of challenged blocks 

rand s Two random numbers generated by each auditor, respectively 
r, and s, Two numbers aggregated by r and s, respectively, constituting the integrity challenge 
u and o Two data calculated by the challenge and the stored file, constituting the integrity proof 


3.3. Auditing Framework. This section introduces how the 
smart contract boosts the interaction of each participant and 
achieves data security checks and aggregation. Note that all 


of them 


are executed on-chain except for the data out- 


sourcing indicated by the dotted line in Figure 1, which is 
done off-chain. 


(i) 


(ii) 


(iii) 


(iv) 


Deploy AMSC: AMSC is deployed at the very 
beginning. All storage providers, data owners, and 
auditors in the system are going to listen at its 
address for the events. 


Enroll File: assuming data outsourcing has been 
done off-chain, the data owner submits the file 
identifier F, q and his storage provider’s address to 
AMSC for the file enrollment. 


Request Audit, Instantiatea New AASC, and 
Inform Audit In formation: these three steps are 
done consecutively together. When an audit is 
launched, the data owner sends an auditing request 
to AMSC along with the fee he is willing to pay. The 
request includes F; 4 and the challenged number of 
data blocks c. After a brief verification, an AASC 
instance for this file will be deployed by AMSC. All 
participants listening to AMSC will receive this 
event. Consequently, the data owner and his 
storage provider begin to listen to the newly 
deployed AASC’s address. 


Apply Audit: at the same time, auditors also re- 
ceived the above event. Any interested auditor can 
apply for the audit by generating two large random 
numbers r,s € Z% and submitting them to AASC 
with 2PC. Meanwhile, enough deposits are re- 
quired. The detailed process of this 2PC is illus- 
trated in Figure 2. 


(v) Challenge: AASC adds all the submitted r' s into 


r, and s' s into s, then sends {r,,s,} to the cor- 
responding storage provider as the challenge. 


(vi) Submit Proof: on receiving the challenge, the 


storage provider computes {r,,s,} together with 
the challenged data blocks to obtain the integrity 
proof P = fu, o}. 


(vii) Verif y Proof and Ballot: after receiving P from 


the storage provider, AASC distributes it together 
with the two previously generated numbers 
{r> Sp 4 0} to auditors. Each auditor acquires the 
result of the data’s integrity by checking the val- 
idity of P, then sends the result back to AASC. 


(viii) Judge: AASC compares all the received results, if 


they are consistent, this result is taken as the final 
result. The balance in the contract account (in- 
cluding the data owner’s auditing fee and the 
auditors’ deposits deducted for failing 2PC) is then 
distributed to the remaining auditors as their 
rewards. 


(ix) Arbitrate: if auditors do not draw a unanimous 


conclusion about the result, AASC sends 
{r> 8,4, 0} to the data owner, who then performs 
an arbitration to get the final auditing result. Based 
on this result, AASC distributes the balance in the 
contract account to the auditors who achieve the 
same result as the data owner as their rewards. The 
detailed process is illustrated in Figure 3. 


3.4. Algorithms. This section introduces the calculations that 
each participant needs to complete in an auditing 
assignment. 
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Ficure 1: Smart contract-based audit process framework. 
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Ficure 2: Two-phase commit process for applying an audit. 


Each auditor submits its 
audit result according to 
the proof. 


The result is taken 
as the final one. 


All audit results 
are consistent? 


rN 


No 


The data owner submits its 
audit result according to 
the proof. 


FIGURE 3: Judgment and arbitration process for the final audit result. 


(1) Setup (1*) — {pk, sk}: the algorithm is executed by bilinear map e: G x G — Gr. Let g be the generator 
the data owner. With the security parameter A, the of G. The data owner chooses a cryptographic hash 
data owner chooses two cyclic multiplicative groups function H: {0,1}" — G, a pseudorandom func- 


G, Gr with the same prime order goo, and one tion ¢: Zi x {1,2,... n} € Zp a pseudorandom 
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permutation 7: Z* x {1,2,...,n} —> {1,2,...,n},a 
secure hash function h: Gr — Zp, and a random 
value u € G. Then the data owner selects a random 
value sk = x € Z% as the secret key and calculate the 
public key as v=g*. Finally, release 
pk = {q,G, Gr, e, H, h, $, 2, u, v} to public and keep 
sk as secret. 


(2) TagGen(F, F; 4, pk, sk) —> {®,t}: this algorithm 


(3) TagVerif y(F,®, pk, F; 4) — {0, 1}: 


is executed by the data owner. Let the file F = 
{m,,m),...,m,} be identified by F; 4 € Z%, where 
m; € Zi (i=1,...,n). Then, for each block, the 
data owner computes the corresponding au- 
thenticator o; = (H(W,)-u'')*, where H(W,) is 
the identifier of m; and W; = F; 4 || i. Then generate 
a file tag t = F; al/Sig.. (F; a) where Sig, (F; q) is the 
signature of the file. The data owner then uploads 
the data file F and corresponding data tag {®, t} to 
the storage provider, where ® = {o;}. 

this algo- 
rithm is executed by the storage provider. Besides 
verifying the validation of the file identifier’s sig- 
nature, the storage provider checks the correctness of 
each authenticator by 


e(o; g) = e(H (F; alli) - u™, 9°), (1) 


and output the result of the authenticator verifica- 
tion, 1 for true and 0 for false. 


(4) Challenge(-) — {r,,s,}: this algorithm is executed 


(5 


~ 


by auditors together with AASC, each auditor in- 
dependently picks two big random numbers r and s 
from Z}, then sends them to AASC. AASC aggre- 
gatesallr’ sands’ sinto r, and s,, respectively. AASC 
finally sends the two numbers to the storage provider 
as the data integrity challenge. 

Proo fGen(f S, c, F,®) — {u,o}: this algorithm 
is executed by the storage provider. On receiving r, 
and s, together with the challenged block amount c, 
the storage provider calculates the challenge index 
set I = {u;},.;-, and the random parameter set 
{vi} cico where u; = n (r,i), v; = (Spi). The stor- 
age provider sets S = e(u, v), y = h(S), W’ = X m;v;. 
After calculating u = yu’ and o = [| oj”, the Storage 

jel 


provider responses P = {u, 0} as the integrity proof 
to AASC. 


(6) ProofVerif y(r,,5,,4,0) — {0,1}: this algorithm 


is executed by each auditor. After receiving the proof 
P = {u,o}, r, and s, transmitted by AASC, auditors 
check whether 


weond((Tpr) ee) o 


holds, then output the auditing result, 1 for true and 
0 for false. 


4. Analysis of Our Scheme 


4.1. Security Model. We consider our scheme to fulfill the 
following two security requirements. First, the integrity of 
the challenged files is properly verified if the storage provider 
and auditors execute the protocol honestly. Second, the 
scheme resists semitrusted storage providers from deceiving 
the auditors about the integrity of the challenged data. It 
means, if the storage provider does not have the intact data 
file, it cannot generate the correct proof of data integrity. The 
first security requirement is defined as follows. 


Definition 5. The proposed scheme is correct for data in- 
tegrity checking, if for any random r,,s, € Z}, a data file F 
and the corresponding tag ®, the following equation holds: 


Proof Verif y(r,;5,,ProofGen(r,,s,,F,®)) = 1. (3) 


The second security requirement aims to resist three 
attacks mentioned in [43] launched by the storage provider, 
namely forge attack, replay attack, and replace attack. In 
each of these three attacks, the semitrusted storage provider 
responds to auditors with an invalid proof. We can capture 
the requirement through a security game that covers all three 
attacks. This security game consists of adversary & and 
challenger @. & plays the role of a semitrusted storage 
provider who tries to trick auditors by forging data integrity 
proof. The game is described as follows: 


(1) @ runs Setup(1’) algorithm to generate { pk, sk}, 
then release pk to &. 


(2) & makes queries repeatedly to @ for some files. @ 
returns D—TagGen (F, F; a pk, sk) to £. 


(3) Finally, % outputs {o, u} for a data file F and data tag 
® on the challenge {r,,s,}. 


We define the advantage of & is Adv y = Pr[ProofVerify 
(To Sg) 4,0) = 1]. We say the adversary wins the above game 
if Adv,, is non-negligible. 


Definition 6. The proposed scheme is sound, if there exists 
an efficient extraction algorithm such that, for {o, u} output 
by adversary & to the data file F and data tag ® on the 
challenge {r,,s,} and & wins the above game, the extraction 
algorithm recovers file F from ® and {o, u}. 


4.2. Security Analysis 


Theorem 1. (Auditing correctness). When the storage pro- 
vider stores the user’s data correctly, the proof it generates can 
be verified by auditors. 


Proof. Given valid proof from the storage provider 
P = {u,o}, the verification equation (1) in the Proo fVerif y 
algorithm will hold. Based on the properties of the bilinear 
mapping, the verification equation (1) can be proved correct 
by deriving the left-hand side from the right-hand side as 
follows: 


e(o",g) =e 
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We use the hybrid argument technique to prove soundness, 
as in [15]. “Hybrid arguments” have been used extensively in 
cryptography for many years. Such an argument is essentially a 
sequence of transitions based on indistinguishability. First of 
all, we define the following games: Game-0. Game-0 is the 
original game defined in Section 4.1. 


Game-1. Game-1 is the same as Game-0, except that the 
challenger @ keeps a local list of all the tags he has signed. 
If the adversary & has ever submitted a tag ® that (2) has 
a valid signature under sk but (2) has not been signed by 
G, then @ announces failure and aborts. 


Game-2. Game-2 is the same as Game-1, except that @ 
records all responses to TagGen queries from &. If &% 
succeeds but ø output by & is not equal to [] o;”, the 
challenger @ announces failure and aborts./<! 


Game-3. Game-3 is the same as Game-2, except that 

challenger @ announces failure and aborts if at least 

one yw! # X vm, 

jel 
Lemma 1. [f there exists an algorithm Æ that can distinguish 
between Game-0 and Game-1 with a non-negligible proba- 
bility, then we can construct an algorithm B to break the 
existential unforgeability with non-negligible advantage. 
Analysis. If X causes G to abort in Game-1, then we can 

use Á to construct an algorithm B against the existential 
unforgeability of the signature scheme. 


Lemma 2. If there exists an algorithm £ that can distinguish 
between Game-1 and Game-2 with a non-negligible probability, 
then we can construct an algorithm Z to break the computational 
Diffie-Hellman assumption with non-negligible advantage. 


Analysis. Suppose that g* and g” are elements of the 
CDH problem and we set v = g*,u = g”. Suppose % can 
respond to a signature o’, which is different from the ex- 
pected signature ø. We can compute 
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o' Au, _ kerey 
(e »\a(s 1g |. (5) 


WY Ap; 
Therefore, we can calculate g*” = (0'/o) /*! 


Lemma 3. If there exists an algorithm £ that can distinguish 
between Game-2 and Game-3 with a non-negligible probability, 
then we can construct an algorithm Z to break the computational 
Diffie-Hellman assumption with non-negligible advantage. 


Analysis. We assume that h(-) is a random oracle 
controlled by an extractor that answers a hash query posed 


by the adversary. For 7 = h(S) from the extractor, the ad- 
versary outputs {u,o} such that 


seosed (awy) tr) 


jel 


Then, the extractor sets h(S) to be y* +y. The adversary 
outputs {u*,o} such that 


“o'.o)=<((T] nw") volo 
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Finally, {o,u' = (u — u*)/(ņ-*)} can be taken as a 
response to the extractor. 


Theorem 2. (Soundness). Assume that the computational 
Diffie-Hellman problem is hard in bilinear groups and the 
digital signature scheme is existentially unforgeable. Then no 
probabilistic polynomial-time adversary can break the 
soundness of the scheme with a non-negligible probability. 


Proof. Any adversary’s advantage in Game 3 must be 0, 
because if there is no intact file F, ie., at least one 
H + } mjv;, the challenger always announces failure and 
abort§! According to the game sequence and Lemmas 1-3, 
the advantage of the adversary in the original game, Game 0 
must be negligible. H 


4.3. Analysis of Collusion Resistance. As portrayed in the 
Proo fGen algorithm, r, is used as the seed for the pseu- 
dorandom function ¢ to generate the indexes of the blocks to 
be challenged. This means that if ¢ is inherently secure, the 
indexes cannot be known without knowing r,. As proven in 
Theorem 2, the probability that a storage provider generates 
a proof that passes the verification without preserving the 
complete data is negligible. Malicious auditors can make the 
negotiated seed fall into the designed set by colluding with 
the storage provider, so that the indexes and random 
numbers of the challenge blocks are generated as per their 
expectation. The storage provider only needs to store a small 
part of the real data block to pass the ProofVefiry algorithm. 
As long as there exists at least one honest auditor involved in 
the audit, the generation of aggregated random numbers is 
then not controlled by malicious auditors, and the proba- 
bility that the number happens to be in the designed set is 
w/ IZ7 | where w is the size of the set, which is negligible. 


4.4, Discussion on Data Owner’s Trustworthy. The only se- 
curity assumption in our scheme is that the data owner is 
honest. The data owner will perform arbitration when the 
auditors do not reach a consensus on the audit results. 
Actually, this is different from cutting out the auditors and 
allowing the data owner to perform the audit directly by 
himself. In a system where only two parties participate, the 
conclusions declared by either party are unconvincing. In 
our scheme, the arbitration will only be performed when the 
auditors’ results are inconsistent, which means that each 
kind of result is reached by multiple individuals. Moreover, 
the auditor who lied will definitely be discovered, which 
allows the data owner to perform the audit directly by 
himself. This leaves the auditor with no reason to lie, 
meaning that the arbitration may rarely be enforced. 


4.5. Discussion on the Employability of Two-phase Commit. 
Our program uses 2PC in two phases, Challenge Generation 
and Result Submission. The generation of the challenge in 
our scheme relies on two numbers submitted by each auditor 
independently, which are confidential to the other auditors. 
If a malicious auditor knows other auditors’ numbers, then 


he can construct special numbers that prompt the smart 
contract to generate a challenge as he intends, which will 
make the whole scheme fail. We artificially divide the 
submission of secret numbers into two steps by introducing 
2PC: the first step submits the hash value, and the second 
step submits the corresponding hash key. Due to the one- 
way nature of hashing, a malicious auditor cannot derive the 
secret numbers in the first step even if he knows the hash 
value, and thus cannot have any influence on the generation 
of the final challenge by constructing his own number. 
When it comes to the results submission phase, some au- 
ditors may choose to copy other auditors’ results due to their 
laziness. In the first step, the honest auditor can concatenate 
the audit result with its blockchain address to calculate the 
hash value. In this way, the smart contract can determine 
whether an auditor has copied someone else’s results by 
checking whether the hash value submitted in the second 
step matches the key submitted in the first commit. 


5. Implementation and Performance Analysis 


In this section, we discuss the performance of the proposed 
scheme in terms of computation and gas cost, respectively. 
We carry out a series of simulation experiments to evaluate 
the performance of our scheme, and the codes can be found 
at https://github.com/TDMaker/sc-paper. Note that, since 
the underlying layer of our scheme is a P2P overlay network, 
the network traffic required to maintain it must be much 
larger than other end-to-end schemes, so we have omitted 
the comparison of communication costs. 


5.1. Environment. The experiments were carried out on an 
Ubuntu Desktop 20.04 with the processor of Intel(R) 
Core(TM) i7-6500U CPU @ 2.50 GHz x 4 and 4 GB of RAM. 
In the local computing part of each participant, we use the 
pairing-based cryptography (PBC) library [44] and the GNU 
multiple precision arithmetic (GMP) [45], and we implement 
the simulation experiment using C language. In our ex- 
periments, we choose the parameter a.param to be the 
parameters of the PBC library. The smart contracts are 
written in Solidity language and run in the Rinkeby 
Ethereum test net. Each participant uses programs written in 
JavaScript to interact with the smart contract by calling 
the npm: web3 package [46]. 


5.2. Computation Analysis. We analyze the computation 
costs of all subalgorithms of the proposed protocol. We 
chose the size of the data block to be 160bits. Without 
loss of generality, we change the block count from 100 to 
1000 with an increment of 100 in each test. Since TagGen 
and TagVerif y are executed only once for the same file, 
and the time overhead is relatively large compared with 
other algorithms, as shown in Figure 4. The rest of the 
algorithms need to be executed repeatedly during each 
audit, as shown in Figure 5. For Setup, it is used to 
generate the system parameters. Since its time overhead is 
static and relatively small, we do not plot it on the figure 
and only note it 4.715 ms averaged over ten experiments. 
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Figure 4: Time overhead of TagGen and TagVerif y as the amount of blocks increases. 
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Figure 5: Time overhead of Challenge, Proo fGen, and Proof Verify as the number of challenged blocks increases. 


For TagGen and TagVerif y, it is used to compute and 
check data owner’s outsourced authenticators, which take 
much longer time than other algorithms. This time cost 
increases with the size of the user file, which might be- 
come quite large. Fortunately, this time cost is one-time 
and can be done offline. For Challenge, it is to determine 
two random numbers to constitute the challenged block’s 
index sequence, which is pretty fast. For ProofGen, it 
is to calculate integrity proofs by aggregating the 
challenged data blocks. This time cost relies mainly on the 
length of the challenge sequence and increases with the 
number of challenged blocks. For ProofVerif y, it is to 
check the integrity proof, which is generated by the 
storage provider. This time cost is also increasing with the 
number of challenged blocks due to the same reason as 
Proo fGen. In our protocol, the most frequent algorithms 
are Challenge, ProofGen, and ProofVerif y, which are 
periodically performed by the storage provider and au- 
ditors. Thus, data owners in our protocol have a little 
workload after data outsourcing except when arbitration 
is needed. 


Comparison: to show the efficiency advantage of our 
scheme, we compare it with the schemes proposed in 
[24, 25, 47-49]. We list the results in Table 2. The com- 
putation cost of our scheme mainly lies in the expensive 
operations such as multiplication, exponentiation, and 
pairing. Other operations like hash function and addition 
only incur negligible costs, so we omit them when ana- 
lyzing the computation cost. For simplicity, we use T „pup 
T xp» and T, to represent the overhead of multiplication 
operation, exponentiation operation, and pairing opera- 
tion on group G, respectively. Suppose there are n blocks 
in total, of which c blocks are challenged. It is easy to see 
that the entire efficiency of the scheme is mainly de- 
pendent on the efficiency of the algorithms TagGen, 
TagVerif y, ProofGen, and ProofVerif y. However, the 
TagGen and TagVerif y are run only once, its impact on 
the overall efficiency of the audit protocol is negligible. 
Therefore, we only make comparisons to evaluate the 
efficiency of the algorithms Proo fGen and ProofVerif y. 
It is easy to find that our scheme takes the same number of 
multiplication operations as [48], but one more than all 
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TABLE 2: Computation comparison with some existing schemes. 
Schemes Proof generation Proof verification 
Scheme in [47] Ty + (C+ DT ep te= DT mu 2T + (c+ 1)T exp + (c+ DT mu 
Scheme in [48] 2T y+ (C+ UT opt CT nut cT, + (C+ YT ap 
Scheme in [49] (C+ Tey t+ (C= WT ya ITa (C+ 3)T exp + (OF DT rut 
] 
] 


Scheme in [25 CT xp + (C= WT i 2T pt (C+ 2)Lexp + CT mut 
Scheme in [24 2T p + (C+ 2)T exp + (c - DT mul Tp + 3Texp +CT mul 
Our scheme (CHL) TE ge t Tru 2T p+ (C+ 2)T exp 


four remaining schemes in proof generation. Our scheme 
has the same exponentiation operation as the first three 
schemes. Meanwhile, [25] has one less exponentiation 
operation than ours, and [24] has one more than ours. The 
pairing operation is the most time-consuming operation, 
but it occurs only in [24, 47, 48]. In proof verification, the 
scheme [25, 47] needs two pairing operations and the 
scheme [49] needs three paring operations, but in the 


Auditor2 
Auditor1 
Storage provider 


Data owner 


Chairperson 


10 15 20 25 30 35 40 45 50 


0 5 
scheme [48], the paring operation is linear with the GAS COST OF EACH STEP x 100000 
number of challenged blocks, while our scheme and the m Deploy TMSC ae eee 
scheme [48] reduces (c + 1) multiplication operation and E Enroll File E Ballot 

two exponentiation operations compared with the scheme E Request Audit E Inform Submit 

[47, 49]. Although [24] outperforms other schemes in E Apply Audit Hash Key 2 

terms of exponentiation and paring operations, it does pal ea m a Hash Ke2 

R $ s PE . 7 ash Key E Judge 

increase linearly in terms of multiplication operations. m Submit Hash Key! m Arbitrate 


Nonetheless, the above schemes all make various 
computational concessions for functionality while satisfying 
their proposed functional properties on the basis of security. 
So, the mere computation cost comparison can only be used 
as a meager reference. 


5.3. Gas Cost Analysis. Gas is the fuel to be paid for running 
smart contracts on Ethereum. It measures how much 
“work” needs to be done for an operation or a series of 
operations. The gas prevents junk transactions from 
blocking the network and serves as additional income for 
miners. We deployed our smart contracts on Rinkeby 
[50], which is an Ethereum test net (or test network). The 
only difference in whether all auditors reach the same 
conclusion is that there is an additional step of arbitrate 
by the data owner at the end. All other steps are exactly the 
same. Therefore, we only explain the case that requires the 
data owner’s arbitration. Because any number of lying 
auditors can be detected as long as there exists at least one 
honest auditor, and because the number of dishonest 
auditors have no effect on the final result, we introduce 
only two auditors in the experiment: an honest auditor 
and a dishonest one. 

Figure 6 illustrates such an audit assignment, where 
the vertical coordinates represent each participant, and 
the horizontal coordinates portray how much Wei of gas 
each participant spends to execute a certain algorithm. 
Wei is the smallest unit of currency in Ethereum. 1 
Ether = 10!8 Wei. Note that, Inform Submit Hash Key 1, 
Inform Submit Hash Key2, and Inform Proof Gen are 
events emitted (which are only auxiliary steps), so we did 
not list them in Section 3.3 for clarity. Submit Hash Key 1 
and Submit Hash Key 2 are substeps of 2PC, so they have 


E Inform Proof Gen 


Ficure 6: Gas cost of each step. 


not been listed, either. As we can see from the figure, the 
two algorithms with the highest gas cost are 
Deploy TMSC and Request Audit, because these two al- 
gorithms involve the deployment of smart contracts. The 
large amount of gas consumed by smart contract de- 
ployment comes from two aspects, on the one hand, the 
CREATE op code of the smart contract, which is called 
during contract creation, costs a fixed 32,000 gas; on the 
other hand, from the storage of contracts, more byte code 
means more storage, and each byte costs 200 gas. This 
adds up very quickly. And the left operations require very 
little gas overhead. Fortunately, the Deploy TMSC algo- 
rithm is a management contract for audit assignments 
that is deployed only once in an audit system, while the 
Request Audit algorithm is instantiated once for every 
audit assignment executed. Other operations require less 
gas overhead. In an audit assignment, the increase of gas 
for each additional auditor is less than 400,000. The gas 
overhead for the other participants is fixed, except when 
data owner arbitration is required, which needs an ad- 
ditional 100,000 gas, but this overhead is insignificant 
compared to the reward it can earn. 


6. Conclusion 


In this paper, we design a remote data integrity audit 
auditing scheme based on the Ethereum smart contract. The 
challenge of this scheme is jointly generated by all Ethereum 
users participating in the audit. When auditing results are 
inconsistent, the data owner will complete the final 
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arbitration. Safety proofs and experimental results show that 
our scheme is secure and efficient. 
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